{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ced5edf1-4d1d-4e81-ba1e-58b26ac6ca8c",
   "metadata": {},
   "source": [
    "# LlamaIndex Platform Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b450616c-ec1b-48ad-a5be-0e98a1123831",
   "metadata": {},
   "source": [
    "## Step 0: Setup environment config for platform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "da7a0c2e-54d2-4e8c-9edc-c1c457fc24f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"your-api-key\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a89b27f6-d9ab-43f9-8c0d-56b68b89f5d5",
   "metadata": {},
   "source": [
    "## Step 1: Configure ingestion pipeline (data source, transformations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "654f252d-d68d-4a91-8a40-5bffce8c61ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.ingestion import IngestionPipeline\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "aef2e044-3188-4e46-9a3a-7105a223d07e",
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = SimpleDirectoryReader(input_files=['data_sec/source_files/uber_2021.pdf'])\n",
    "docs = reader.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8165a372-329b-46d4-9b27-9b146f19ff09",
   "metadata": {},
   "outputs": [],
   "source": [
    "sec_pipeline = IngestionPipeline(\n",
    "    project_name='sec analysis',\n",
    "    name='uber',\n",
    "    documents=docs,\n",
    "    transformations=[\n",
    "        SentenceSplitter(),\n",
    "        OpenAIEmbedding(),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4a974962-6874-4edd-b1c1-386bf86d5f5c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pipeline available at: https://cloud.llamaindex.ai/project/a18cb1c9-393e-44bc-af5b-2cbc554c7a3f/playground/ca240136-f463-46d4-a7f9-70f9566c0b31\n"
     ]
    }
   ],
   "source": [
    "sec_pipeline_id = sec_pipeline.register()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d30e465d-bb09-4995-89bf-66c21a184108",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llama_dataset import LabelledRagDataset\n",
    "rag_dataset = LabelledRagDataset.from_json(\"./data_sec/rag_dataset.json\")\n",
    "questions = [example.query for example in rag_dataset.examples[:5]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "05fedd5e-4d2c-4ce4-b90f-d735071aacd3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. According to the context information provided, what is the state of incorporation for Uber Technologies, Inc., and what is the company's IRS Employer Identification Number?\n",
      "2. Based on the information from the document, which type of annual report did Uber Technologies, Inc. file with the SEC for the fiscal year ended December 31, 2021, and on which stock exchange is Uber's Common Stock registered?\n",
      "3. According to the context information provided from the \"uber_2021.pdf\" document, what is the classification of the filer as indicated by the check mark, and what does this classification imply regarding the company's filing requirements?\n",
      "4. As of June 30, 2021, what was the aggregate market value of the voting and non-voting common equity held by non-affiliates of the registrant, and on which stock exchange was this value based?\n",
      "5. According to the table of contents in the \"UBER TECHNOLOGIES, INC.\" document, what are the main topics covered under Item 7 in Part II, and on which page does this section begin?\n"
     ]
    }
   ],
   "source": [
    "for ind, question in enumerate(questions):\n",
    "    print(f\"{ind + 1}. {question}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "f8009211-afe5-4ed2-95a4-41961cdcc447",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploaded 5 questions to dataset AI generated - 5 questions\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'27d98fff-4dd2-4eb0-8904-24128082eeea'"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.core.evaluation.eval_utils import upload_eval_dataset\n",
    "\n",
    "upload_eval_dataset(\n",
    "    project_name='sec analysis',\n",
    "    dataset_name='AI generated - 5 questions',\n",
    "    questions=questions,\n",
    "    overwrite=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18d113bd-5afc-4def-9869-2a6271439cfb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
